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ABSTRACT 

The ongoing growth in the volume of scientific literature available 
today precludes researchers from efficiently discerning the rele¬ 
vant from irrelevant content. Researchers are constantly interested 
in impactful papers, authors and venues in their respective fields. 
Moreover, they are interested in the so-called recent “rising stars” 
of these contexts which may lead to attractive directions for future 
work, collaborations or impactful publication venues. In this work, 
we address the problem of quantifying research impact in each 
of these contexts, in order to better direct attention of researchers 
and streamline the processes of comparison, ranking and evalua¬ 
tion of contribution. Specifically, we begin by outlining intuitive 
underlying assumptions that impact quantification methods should 
obey and evaluate when current state-of-the-art methods fail to sat¬ 
isfy these properties. To this end, we introduce the s-index met¬ 
ric which quantifies research impact through influence propagation 
over a heterogeneous citation network, s-index is tailored from 
these intuitive assumptions and offers a number of desirable qual¬ 
ities including robustness, natural temporality and straightforward 
extensibility from the paper impact to broader author and venue im¬ 
pact contexts. We evaluate its effectiveness on the publicly avail¬ 
able Microsoft Academic Search citation graph with over 119 mil¬ 
lion papers and 1 billion citation edges with 103 million and 21 
thousand associated authors and venues respectively. 

1. INTRODUCTION 

The publication and circulation of influential work has served 
as the cornerstone of research practices since the inception of sci¬ 
entific discovery itself. Both budding and veteran researchers are 
known for the quantity and quality of work they produce and more¬ 
over by the mark, or impact, that they leave on the scientific com¬ 
munity. Furthermore, influential works inspire future endeavors 
by means of the results and ideas which they put forth - this phe¬ 
nomenon of incremental discovery is colloquially referred to by the 
phrase “standing on the shoulders of giants.” However, given the 
continued growth in the sheer amount of literature available today, 

*This work was done while working at Microsoft Research Red¬ 
mond. 
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researchers are deluged with more and more irrelevant information 
from which they seek only a small fraction. This makes the mea¬ 
surement of scientific impact of relevant papers, authors and venues 
an important problem. 

Scientific impact is a central tenet in the evaluation of research 
success. While quantitative metrics are not a replacement for care¬ 
fully reading an author’s works and evaluating peer regard, they 
are frequently used in practice to provide at-a-glance, summary in¬ 
formation. For example, researchers, departments, institutions and 
venues are commonly evaluated using a number of numerical im¬ 
pact metrics including citation count ]36| , fi-index GZ) and journal 
impact factor CD- These metrics are typically used for crucial de¬ 
cisions from management perspectives including appointing aca¬ 
demic posts, assigning tenure, awarding prizes and electing can¬ 
didates for prestigious academies. Similarly, from a researcher’s 
perspective, these metrics can play a major role in determining rel¬ 
evant previous work, future research directions, potential collabo¬ 
rations and strategic publication venues from a citation and recog¬ 
nition perspective. 

However, impact quantification is not a trivial problem, particu¬ 
larly because the concept of impact is itself not precisely defined. 
Various impact metrics focus on different definitions of impact it¬ 
self. For example, paper impact is frequently judged using simple 
citation count - thus, papers which have a large quantity of citations 
are considered to be the most impactful. Author impact has tradi¬ 
tionally been measured using fi-index, which incorporates (to some 
extent) both quantity of papers and the quantity of the citations they 
receive as a measure of quality. Venue impact is often defined using 
the impact factor, which considers the average number of citations 
received per paper over the last 2 years for each venue. These defi¬ 
nitions all inherently capture distinct concepts of impact with inher¬ 
ently different assumptions. Hence, it is unsurprising that current 
state-of-the-art metrics are not without a number of inconsisten¬ 
cies and unexpected behaviors in practice. In fact, numerous works 
from different fields including social sciences, bibliometrics and 
physics establish notable problems with evaluation arising from the 
use of these metrics in practice (umiziiiT). We argue that given 
the career and livelihood ramifications of these metrics, it is im¬ 
portant for impact metrics to be principled and in-line with human 
understanding. In this work, we focus exactly on this problem of 
developing improved, powerful and practical metrics for effectively 
quantifying research impact. 

We begin by identifying a number of desiderata that good impact 
metrics should obey. These attributes have firm grounding in hu¬ 
man intuition about how impact should be manifested by different 
entities (in this work, we consider impact of papers, authors and 
venues). We next identify relevant prior works and current state- 
of-the-art metrics used to quantify impact in practice and evaluate 




Figure 1: s-index measures entity impact by modeling influence 
propagated by papers over citation networks. Edge pi —>■ p 2 de¬ 
notes that p 2 cites pi and edge weights denote influence from the 
center (source) node reaching a receiver. 

when these metrics fail to satisfy intuitive properties. In response, 
we huild necessary groundwork for and propose the s-index metric 
which is designed to exhibit these traits. Our approach computes 
paper impact hy modeling influence propagated by papers over a 
paper-paper citation network (see Figure and extends this princi¬ 
ple to associated author and venue nodes in a heterogeneous paper- 
author-venue citation network, s-index offers numerous compar¬ 
ative advantages over existing state-of-the-art impact metrics and 
emphasizes both quantity and quality of research while adhering 
to a number of important properties related to the tradeoff between 
the two. 

Our work offers a number of notable contributions towards solv¬ 
ing the problem of research impact quantification: 

1. Analysis: We identify a number of features that impact met¬ 
rics should obey in order to be employed in practice, and 
analyze when current state-of-the-art metrics do not perform 
accordingly. 

2. Algorithm: We build intuition for and propose a fast and 
scalable algorithm to compute the s-index metric, which quan¬ 
tifies research impact based on influence propagation over a 
citation graph and is currently deployed at Microsoft. 

3. Evaluation: We evaluate the s-index on the large, Microsoft 
Academic Search citation graph with over 119 million pa¬ 
pers, 1 billion citation edges, 103 million authors and 21 
thousand venues and show promising results in practice. 

Reproducibility: The Microsoft Academic Search data used in 
our work is freely available at research.microsoft. com/ 
en-us/pro jects/mag/ Both MATLAB and MS-SQL rela¬ 
tional implementations of our algorithm are made available at cs . 
emu.edu/~neilshah/code/sindex.tar.gz 

2. PROPOSED DESIDERATA 

In order to build an improved impact metric, it is important to 
identify how we expect impact to be defined in practice. The Eco¬ 
nomic and Social Research Council (ESRC) defines academic re¬ 
search impact as follows: 

Academic impact is the demonstrable contribution that 
excellent social and economic research makes to scien¬ 
tific advances, across and within disciplines, including 
significant advances in understanding, method, theory 


and application. (H 

Beyond this broad definition, there are few real constraints on 
how research impact is construed. In practice, impact is tradition¬ 
ally defined in some means by using available information about 
citations, or references which publications (henceforth referred to 
as papers) make between each other to signify drawing from, or 
leveraging other works in their own. In their simplest form, ci¬ 
tations are direct links which convey the number of “uses” of a 
paper. However, simply tallying the number of citations received 
by various papers, authors and venues makes for a very elementary 
impact metric, which can only be considered a first-order approxi¬ 
mation of the notion of “contribution” referred to in the above def¬ 
inition. In reality, impact is both a direct and indirect phenomenon, 
and metrics which quantify impact should account for this property, 
amongst several others. 

In this section, we pose the question: what makes a good impact 
metric? To answer it, we propose and define these desiderata which 
should be considered in designing and employing the use of an im¬ 
pact metric. They are (a) volume sensitivity, (b) prestige sensitivity, 
(c) robustness, (d) extensibility, (e) temporality, (f) interpretability 
and (g) computability. While we discuss the properties in the con¬ 
text of paper impact, the principles extend naturally to the broader 
author and venue contexts. 

2.1 Volume Sensitivity 

Volume sensitivity reflects the concept that the more a work is 
cited, the more impactful it is. This does not imply that direct ci¬ 
tation count is the best impact metric, but rather that between two 
otherwise equivalent papers a and b, a is more impactful if it has 
an extra citation over b. Thus, more citations does not negatively 
affect impact and only helps it. This property is intuitive and is the 
fundamental tenet of citation counting for impact. 

2.2 Prestige Sensitivity 

Prestige sensitivity captures the idea that impact metrics should 
weigh citations from different papers differently. In other words, 
not all citations are considered equally. The intuition behind this 
property is that citations from “high-quality” papers should matter 
more than those from “low-quality” ones (for example, a widely- 
acclaimed seminal work in a famous journal versus an uncited and 
unpublished work posted online). Typically, the quality of citing 
papers is defined recursively in some fashion by their own citation 
count. Most traditionally used impact metrics offer no or very lim¬ 
ited prestige sensitivity. 

2.3 Extensibility 

Extensibility refers to the idea that the principles by which an 
impact metric is defined should extend to the impact definitions for 
other entities. Specifically, the property of extensibility ensures that 
impact is defined in a unified and coherent way across papers, au¬ 
thors and venues rather than a segmented one which only applies 
to certain entities. It is both perplexing and unintuitive that cur¬ 
rently used state-of-the-art impact metrics convey effectively dif¬ 
ferent measures of impact for papers, authors and venues. We ar¬ 
gue that papers are the fundamental building block of impact, and 
authors and venues are simple aggregates of associated papers. An 
author cannot have impact without the papers which he writes, nor 
can a venue have impact without the papers which it publishes to 
the scientific community at large. 

2.4 Temporality 

Impact can be viewed in two ways. Firstly, impact can be con¬ 
sidered as a static, “lifetime achievement award” which quantifies 



total influence from inception onwards. This could be considered 
as the total impact of a paper from publication or the total impact of 
an author or venue from its first publication. Secondly, impact can 
be examined as a dynamic, ever-changing property which quanti¬ 
fies influence recently. For example, a paper written in the 1600s 
may have a large static impact, but the dynamic impact may wane 
over time due to a variety of reasons, including declining interest 
in the specific work, the field or as a result of shift towards newer 
results. We argue that while impact metrics should be extensible 
across entity contexts, they should also be extensible over time and 
offer dynamic counterparts with similar principles - we refer to this 
idea of extensibility over time as temporality. 

Impact metrics which offer temporality are particularly useful 
because they can measure (by definition) how influence changes 
over time, and thus offer means of measuring recent productivity 
and popularity. This is especially useful for comparison purposes 
between papers, authors and venues at different points in their re¬ 
spective careers. Furthermore, temporality is associated with pre¬ 
dictability, as those with higher recent impact are intuitively ex¬ 
pected to coincide with so-called “rising stars.” 

2.5 Interpretability 

Metrics can be arbitrarily simple or complex. Often, more com¬ 
plex models which rely on multiple data sources offer more ex- 
pressibility than simpler ones. In the impact metric context, one 
can consider the simplest metric to be citation count. However, a 
more complicated (though perhaps rich) metric might account for 
a variety of factors such as auxiliary author/institution/venue fea¬ 
tures, semantic similarity with previous works, or produce tuples 
instead of single numbers. However, one must keep in mind that 
impact metrics are meant to be used by humans - thus, they must 
offer interpretability. In practice, these metrics are used not only 
to rank, but also to compare and predict. It is of utmost importance 
that those who use them have some understanding of how the con¬ 
cept of impact is being ranked, and what they are comparing and 
predicting. 

2.6 Robustness 

When a metric is introduced as a quantifier, its value as a mea¬ 
sure immediately begins to decline. This consequence was initially 
formulated in the economic context and is known as Goodhart’s 
Law, which states “when a measure becomes a target, it ceases to 
be a good measure,” in reference to the response of investors to 
act in ways which they seek to benefit from economic regulations. 
Analogously, as impact metrics are proposed, researchers will seek 
to adopt practices which enable them to capitalize from the metric 
and better their rankings. Thus, it is important that impact met¬ 
rics are robust, or difficult to rig or game by means of disreputable 
practices (such as unwarranted self-citation, double publication and 
citation trading). 

In practice, self-citation can be used both legitimately and ille¬ 
gitimately and it can be difficult to automatically discern between 
the two. Thus, we argue that impact metrics which explicitly penal¬ 
ize self-citation are ideal. Rather, it is more promising to measure 
impact in a way which diminishes the incentive to self-cite illegit¬ 
imately - note that this is inherently impossible with metrics that 
prestige sensitivity and treat all citations equally. 

2.7 Computability 

Good impact metrics should be easily computable. Citation net¬ 
works and their more complex, heterogeneous representations are 
constantly growing with the volume of available literature. Impact 
metrics which are impractically expensive or difficult to compute. 


no matter how expressive or even interpretable they are, are sim¬ 
ply not practically usable. Online citation and ranking databases 
which are commonly used such as Microsoft Academic Search 
(23| , Google Scholar (16| , CiteSeerX (24| and ArnetMiner |31| 
deal with very large datasets and require frequent updates to impact 
metrics given the frequency with which they index new articles - 
thus, requiring several days or longer to compute metric scores is an 
unattractive option. Computability is an especially important con¬ 
sideration for complex metrics which incorporate costly operations 
such as semantic similarity and content-based approaches, central¬ 
ity metrics and slow-converging graph algorithms. Furthermore, 
content-based approaches which use topic modeling or other ran¬ 
domized algorithms are approximate, meaning that impact metrics 
can be computed substantially differently even on the same dataset 
- this is, of course, undesirable. 

3. PRIOR WORK & ANALYSIS 

3.1 Prior Work 

3.1.1 Impact Metrics 

Citation count is perhaps the oldest and most commonly used 
metric for measuring research paper impact, notes that Google 
Scholar considers citation count to be the highest weighted factor 
for paper ranking. Ranking by citation count is a common, but con¬ 
troversial practice in that it reinforces the rich-get-richer concept 
(Matthew effect). | |32| proposes the CiteRank algorithm for paper 
ranking, which is similar to Google’s PageRank algorithm |18| but 
distributes random surfers exponentially with age, favoring more 
recent works. This assumption biases against older papers, which 
is an unintuitive assumption for overall impact calculation. also 
uses PageRank to assess importance of papers published in Physi¬ 
cal Review journals. 

pTj proposes the fe-index to quantify author’s research output. 
To compare researchers with different career lengths, the m quo¬ 
tient, derived from dividing/i by the length of the author’s academic 
career, is also suggested. CD proposes the g-index as an alternative 
to more heavily account for an author’s top contributions, which 
may have disproportionately more citations than his less popular 
papers. The a, r and ar-index defined in |19| and |20| use variants 
of mean citations of popular papers in the Hirsch core to capture 
the average impact of an author’s high-performing papers in order 
to less penalize authors with high fi-index. Google Scholar recently 
introduced the ilO-index CD , defined as the number of papers with 
10 or more citations. uses factor analysis to classify these in¬ 
dices into two main types which emphasis work quantity and qual¬ 
ity. These groups represent the concepts of volume sensitivity and 
prestige sensitivity, respectively. 

Venue impact of journals in the same field is usually computed 
using journal impact factor (JIF) CD- However, IIF computes a 
mean over a heavy-tail distribution of citation counts and is thus of 
limited value as a statistical measure. Q describes the EigenFactor 
metric for ranking journals based on PageRank on journal-journal 
citation graphs generated through network inference via paper cita¬ 
tion. 

3.1.2 Impact Prediction 

A number of works utilizing regression and classification have 
been proposed in the past with the aim of predicting citation count 
or otherwise quantifying research success. Though our work does 
not directly focus on prediction, these works relate to the concept 
of temporality for impact metrics and are thus described. |22| 
proposes k nearest neighbors (KNN) regression on citation count 


Table 1: Qualitative comparison with modem research impact metrics. 


Volume Sensitive 

Prestige Sensitive 

Extensible 

Temporal 

Interpretable 

Robust 

Computable 

Citation count 

/ 

X 

/ 

✓ 

/ 

X 

/ 

/i-index 

- 

- 

X 

✓ 

/ 

X 

/ 

JIF 

/ 

X 

/ 

✓ 

/ 

X 

/ 

PageRank 

/ 

- 

/ 

✓ 

- 

/ 

- 

s-index 

✓ 

/ 

/ 

✓ 

/ 

/ 
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differences over previous years for the KDD Cup 2003 citation 
prediction task. |28| identifies features distinguishing well and 
poorly cited papers on local reference networks of paper-paper ci¬ 
tation graphs. |21||35| use various regression techniques including 
support-vector regression (SVR) and linear regression (LR) on nu¬ 
merous features to predict field-specific paper citation count several 
years ahead, uses multiple classification models to determine 
whether a given paper will increase the author’s /i-index or not. 
|33| identifies some important mechanisms that play a role in long¬ 
term paper citation count including aging and the Matthew effect. 

3.2 Analysis 

In this section, we qualitatively evaluate the performance of cur¬ 
rent state-of-the-art impact metrics which are commonly used to¬ 
day. Given the breadth of the previously described prior work, 
we select 4 representative approaches which sufficiently span the 
multitude of approaches. They are (a) citation count, (b) fi-index, 
(c) JIF and (d) PageRank. Table gives a high-level summary of 
the strengths and weaknesses of each approach with respect to the 
desiderata identified in Section For computability results, we 
assume a heterogeneous citation graph G, defined as follows: 

Definition 1 (Citation graph G). Ghas |P| papernodes, 
\A\ author nodes, and \ V\ venue nodes, \Epp\ paper-paper edges, 

I Epa I paper-author edges and | Epv | paper-venue edges where edge 
pi —>■ P 2 denotes that paper pi is cited by paper p 2 , p ^ a denotes 
that paper p is authored by author a and p ^ v denotes that paper 
p is published by venue v. 

3.2.1 Citation count 

Citation counting involves tallying the number of citations re¬ 
ceived by a paper, author or venue. We will denote the number of 
citations of a paper p by C{p). 

It is a purely volume sensitive metric, and offers no means of 
prestige sensitivity, since all citations are weighted equally in com¬ 
puting impact. In many cases, citation count does not correspond to 
the supposed impact of a paper - for example, dD shows substan¬ 
tial differences between “best papers” from computer science con¬ 
ferences versus the most cited ones. This can happen for numerous 
reasons: in the case of ubiquitously used results in which the origi¬ 
nal work is no longer cited, in the case of incremental works lead¬ 
ing up to an important result or in the case of older papers which 
are buried by new literature. Citation counting is extensible, as it 
can be applied to authors and papers quite easily. Furthermore, it 
is temporal given appropriate conditioning on input data. It offers 
straightforward interpretability as the number of citations of an en¬ 
tity. However, it is not robust given that it is highly susceptible to 
disreputable practices such as self-citation, double publication and 
citation trading - Q shows that self-citation makes up a significant 
part of general citation activity and its presence plays a substantial 
role in citation counts of papers and authors. Citation counting is 
easily computable and can be computed in 0{\Epp\) for all papers 
in the paper-paper citation graph G. 


3.2.2 h-index 


The /i-index of an author a with published papers P“ is defined 
as 


H(a) — max h s.t. 

iieN+ 



> h 


where [ ■ ] serves as an indicator function. Informally, it is referred 
to as the maximal h for which the author has h papers with h or 
more citations each. 

/i-index is only somewhat volume sensitive and prestige sensitive 
despite considering some concept of both paper quantity and qual¬ 
ity. To illustrate, let us consider two scientists a and b with varying 
publication records. Suppose that both a and b have published 10 
papers with 10 citations each, but b has additionally published 90 
papers which received 9 citations each. Counterintuitively, both 
scientists have an equivalent /i-index of 10, despite scientist fe’s 
much higher quantity of work. Alternatively, consider scientists 
c and d who have both published 5 papers. However, each of c’s 
papers has 5 citations each, whereas each of d’s papers has 500 ci¬ 
tations each. Once again, both authors have an equivalent /i-index 
of 5, despite scientist d’s much higher quality of work, /i-index 
is defined only in the author context (though it is sometimes used 
for venue impact), but offers no extensions for paper impact and 
is thus not extensible. It is however temporal given appropriate 
conditioning on input data. It has an interpretable definition as 
well. (27) and Q show that strategic self-citation can dramatically 
boost /i-index over time due to the metric’s equivalent treatment 
of self-citations and citations from others, thus limiting robustness. 
fi-index is relatively computable and can be computed in roughly 
0{\Epp\ -f \Epa\ -F \A\ wlogiw)) for \ A\ authors and w mean 
paper-author edges (papers per author). 


3.2.3 Journal Impact Factor 

The IIF of a venue v with papers published in the previous 2 
years P 2 is defined as 


J(v) 


E C(p) 

peP” 

\P^\ 


Thus, it is computed as the mean number of citations received by 
papers published in that time frame. 

IIF shares similar properties to citation counting given its innate 
dependent on the citation count. It is volume sensitive but not pres¬ 
tige sensitive. It is extensible as citations can be agglomerated on a 
paper, author or venue context (though should not be used in prac¬ 
tice outside of the venue context | |10| ) . It is additionally temporal 
given mean computation over the last t years (though f = 2 in 
practice, f £ N could be used more generally). IIF is also inter¬ 
pretable. However, it is not robust both due to sole emphasis on 
citation quantity as well as due to coercive self-citation and impact 
factor boosting tricks mentioned in |34| and |25| . Furthermore, 
given that IIF computes a mean over a power-law distribution, it is 











highly susceptible to “black-swan” outliers - for example, the im¬ 
pact factor of the journal Acta Crystallogmphica rose from 2.05 to 
49.93 in 2009, more than Nature and Science due to the result of 
just 1 publication |30| . JIF is easily computable and can be com¬ 
puted in 0(|i?pp| -|- I i5p„ I) for all journals. 


3.2.4 Page Rank 

The PageRank PR{p) of paper p in G is defined as 


PR{p) 


1-d 


+ d ^ 

q e i-l({p}) 


PR{q) 

HU}) 


where d is a damping factor in (0,1), ({p}) denotes p’s refer¬ 

ences and L({p}) denotes those papers which q is cited by. It can 
also be computed as the dominant eigenvector of the associated 
stochastic Google matrix described in )18| . 

PageRank is volume sensitive since the more citations a paper 
p has, the higher its PageRank in otherwise equivalent contexts. 
However, PageRank has limited prestige sensitivity - although pa¬ 
pers with many citations pass greater influence to their own ref¬ 
erences, PageRank has the property that a paper’s propagated in¬ 
fluence is apportioned equally between its references. This means 
that a citation from a paper with many references is less impor¬ 
tant than one from a paper with fewer references. In the web- 
context in which PageRank was originally proposed, this assump¬ 
tion makes sense given that pages which have many external links 
are often link farms or low-quality web-indices. However, in the 
citation graph context, the length of a work’s reference list is not 
a measure of exclusivity but rather of relevance - it is certainly 
not obvious that the length of the reference list of a paper bears 
any influence on the quality of the work itself. In fact, some of 
the works with the highest reference counts are textbooks, surveys 
and tutorials which are highly impactful in making technical ex¬ 
pertise accessible to many authors. Thus, we argue that this ex¬ 
clusivity property of PageRank is ill-suited to represent prestige 
in citation graphs. PageRank is extensible given appropriate net¬ 
work inference to construct author-author and journal-journal cita¬ 
tion graphs. It is also temporal given appropriate conditioning on 
input data. Though PageRank in web contexts has the traditional 
“random surfer” interpretation, the model lacks interpretability as 
a metric in the research impact context given the exclusivity as¬ 
sumption. Furthermore, given numerous issues of computability, 
including the many iterations required for sufficient convergence in 
practice despite 0{\Epp\) runtime per iteration and generally in¬ 
adequate machine precision due to |P|, the resulting (0,1] scores 
from PageRank computation are difficult to interpret in the research 
impact context. 


4. PROPOSED S-INDEX 

In this section, we first build intuition towards, and next define 
the s-index for quantifying paper, author and venue impact. Lastly, 
we give a scalable algorithm for computing s-index efficiently. 

4.1 Intuition 

We begin by posing the following problem: 

Problem 1 (Paper impact). Given: the citation graph G, 
find: a metric by which to quantify the value of papers according 
to their research impact. 

We start from the most fundamental idea: counting the number 
of citations of each paper in order to rank them is a straightforward 
first-order approximation of any volume sensitive impact metric. 


Citation count in some sense can be construed as the immediate 
usefulness of the paper to other researchers. If paper p\ is cited 
by paper p 2 , we take this to mean that p 2 has derived some useful 
information from pi - in other words, p\ influenced p 2 . However, 
we know that citation count focuses only the quantity of citations, 
but places no emphasis on the quality. 

To incorporate prestige sensitivity, we can then examine the cita¬ 
tions which paper b received. For ease of explanation, we will now 
define a function L{H) which, given a subset of papers H P, 
will return the maximal subset of papers T f- P which cite some 
paper h £ H - in other words, there exists an edge e € Epp from 
h ^ t for some h £ H and t £ T. These are referred to as 
the descendants of H. We additionally define L'‘{H) for k £ N 
which denotes k compositions of L - that is, L^(H) — L{H), 
L^{H) — L{L{H)) and so on. It will also be useful to define be¬ 
havior for k < 0. k = 0 indicates 0-step neighbors - this means 
that L^{H) = H. For k < 0, we consider the ancestors of the 
nodes in H rather than the descendants, such that L~^{H) returns 
the maximal subset of papers T '£ P which are cited by some paper 
h £ H - in other words, there exists an edge e £ Epp from t ^ h 
for some t £ T and h £ H. The compositional behavior of L'‘(H) 
for negative k is defined similarly: L~^{H) = 
and so on. Note that we refer to proximal links as ancestors and 
descendants rather than in-links and out-links to clarify that the 
paper-paper graph is not just directed, but also effectively acyclic - 
reciprocal citation relationships are extremely rare given the tem¬ 
poral connotation of citation, and only possible with citation to a 
paper published in the future. 

In our example so far, L{{pi}) = {P 2 }, and presuming thatp 2 
only cites pi, L~^{{p 2 }) = pi- If P 2 has received a large number 
of citations, then we can consider p 2 ’s citation to pi as more valu¬ 
able than, for example, a citation from paper pa which also cites pi 
but itself has fewer citations. Intuitively, we use p 2 ’s quantity of ci¬ 
tations as a measure of the quality of pa’s citation to pi. Thus, we 
examine pi’s 2-step descendants L^({pi}), instead of the 1-step 
descendants L({pi}) as for simple citation count. In fact, we can 
take yet another step and look at the 3-step descendants L®({pi}) 
in order to gain further confidence in the quality of papers in pi’s 
1-step and 2-step descendants, and so on. 

Moreover, just as pi influencedpa (more generally L({pi})), we 
can consider that pi also influences the papers which cite pa (more 
generally L^({pi})). This is because those papers which cite pa 
have indirectly drawn some useful information from pi by means 
of Pa. One can imagine that the more steps away from pi a paper 
is in G, the less it draws from, or is influenced by pi. Thus, the 
walk length and influence should be inversely correlated. Given 
the nature of the problem, we conjecture that a constant fraction 
of the influence will wane per each further step away from pi - 
thus, the influence should decay exponentially with respect to path 
length. This assumption is in line with the damping factor idea used 
in PageRank. Though we could consider arbitrary walk lengths 
from each node, spanning descendants as far as the full diameter of 
G, it is more intuitive to consider shorter lengths in practice given 
that the proportion of influence decays rapidly from a to 6 as the 
walk between the two becomes longer. Note that we do not weight 
influence by the inverse of the number of 1-step ancestors of pa 
I/“^({pa}), as this idea is characteristic of PageRank. We discuss 
in Section [ 3 . 2.41 the reasons for which this notion is ill-suited for 
the impact context. 

Given that there are potentially arbitrarily many walks from pi to 
Pa, which should we choose? One option is to choose the shortest 
- this way, the shorter the shortest walk, the more pi influences pa. 
Although this idea is reasonable, we can in fact leverage an even 







Figure 2: A “feed-forward loop” in which paper pi is cited hy both 
P 2 and p 3 , hut p 2 is also cited by pa. 

better option which is even more expressive. Figure shows the 
case of a “feed-forward loop,” in which paper pi is cited by both 
P 2 and p 3 , but p 2 is also cited by pa, which makes the weakness of 
the shortest walk idea more apparent. Specifically, this notion con¬ 
veys that Pi influences p 2 and ps equally, given that the shortest 
walks to both papers are of length 1. Flowever, given that in ad¬ 
dition to citing Pi directly, ps also cites p 2 which cites pi, we can 
say that pi influences pa both directly and by proxy via p 2 , but only 
influences p 2 directly. In practice, this interaction further substan¬ 
tiates the influence that pi has on pa, suggesting that pa has been 
more influenced by pi than p 2 . Thus, making use of multiple walks 
between papers and the interactions they represent is a promising 
approach for defining a more powerful measure of influence. 

This concept of influence propagating between papers is char¬ 
acteristic of the phrase “standing on the shoulders of giants.” With 
this intuition, we express that if paper a influences paper 6, a should 
get some credit for 6’s successes. This concept enforces robustness 
as self-citation and citation trading practices become far less im¬ 
pactful in comparison to producing highly influential work which 
can enjoy exponential s-index growth. This is exactly the notion 
that s-index is built on. 

4.2 Definition 

With the established concepts from Section [4T| we define the 
s-index 5 of a paper p as 

m 

= V{p)) 

i=l 

where I{i) = d* gives an exponentially decaying influence weight 
varying with fraction d and walk length i = 1 ■■■ m, and W{p, L* (p)) 
denotes the number of walks from paper p to each of the nodes 
reachable in i steps (specifically, L*(p)). 

Having established the s-index for quantifying paper value ac¬ 
cording to impact, we can now pose the following (analogous) 
problems for authors and venues. 

Problem 2 (Author impact). Given: the citation graph 
G, find: a metric by which to quantify the value of authors accord¬ 
ing to their research impact. 

Problem 3 (Venue impact). Given: the citation graph G, 
find: a metric by which to quantify the value of venues according 
to their research impact. 

We will treat these problems similarly using the derived s-index 
for papers in order to demonstrate extensibility to the broader author 
and venue contexts. 

We argue that the impact of an author is simply defined by the 
total impact of the works he produces. Thus, we define the s-index 
of an author a as 

‘^(«) = X] 

p^pa 

m 

= Hi)w{p, V{p)) 

p£P°- i=l 


where P“ is the set of papers written by author a. 

In fact, venue impact can be defined in a similar way. Thus, we 
define the s-index of a venue v as 

pGP^ 

m 

= E 

pGP^ i=l 

where P” is the set of papers published in venue v. By this defi¬ 
nition, the most impactful venues are those which have published 
the most impactful work. Note that we do not take the mean over 
the number of papers as in JIF for 2 reasons: (a) the mean is a sta¬ 
tistically inappropriate measure for heavy-tail distributions due to 
outlier sensitivity and (b) JIF reflects the reputation of the journal 
rather than the impact (a venue which accepts very few articles with 
modest citation counts will often have a higher JIF than a journal 
which accepts more articles with a wider variety of citation counts). 

We have now defined the s-index for quantifying impact of pa¬ 
pers, authors and venues, s-index can be interpreted as a modified 
citation count which incorporates both direct and indirect impact. It 
is worth noting that s-index is meant to be construed as a “lifetime 
achievement award,” as it does not consider recency. However, it 
offers very clear temporal extensions which can be useful in several 
scenarios. We now present these. 

In the paper ranking context, one can compute s-index over the 
last r years by using Lr{H) instead of L{H), where Lr{H) only 
considers citations from works published within the previous r years. 
It follows naturally that all further descendants LJ {H) for fc > 1 
were published in the last r years as well, given the temporal conno¬ 
tation of citation edges. This measure takes into account the recent 
impact of the paper in spurring new work in only the last r years - 
it does not consider change in influence over the previous r years 
from older citations, as incorporating this influence would unfairly 
bias the comparison towards older papers that had established many 
descendants. We define the s^-index for papers as 

m 

i=l 

For the author ranking context, we define instead of which 
contains only papers published by the author which have received 
citations in the previous r years. Again, it follows that if for all p £ 
P“, p was published in the last r years, papers in Lr{p) for fc > 0 
were also published in the last r years. This extension is especially 
useful, as one can easily compare how impactful two authors have 
been in recent years. We define the Sr-index for authors as 

Sr (a) = S{p) 

pGP^ 

m 

= E 

i=i 

In the same way, we can define Pf instead of P” in the venue 
ranking context, which contains only papers published at that venue 
which received citations in the previous r years. The same principle 
for recency of the descendants of p G Pf holds as in the other 
contexts. We define the Sr-index for venues as 

5r(u) = ^ S{j>) 

pGP^ 

m 

= E E^W’^(P’ ^rip)) 

p£P^ i = l 




Algorithm 1: s-index 

Data: pap.-pap. adj. matrix A, auth.-pap. adj. matrix B, 


ven.-pap. adj.matrix C, decay factor d, walk len. m 
Result: pap. scores Sp, auth. scores Sa, ven. scores s„ 


Sp = 0^ 

// 

dim 


X 1 

Sa = 0^ 

// 

dim 

1^1 

X 1 

Sa = 0^ 

// 

dim 


X 1 

1 

II 

> 

// 

dim 


X 1 


for i = 1 to m do 


V = A • V 

Sp+ = • V 

Sa — B ■ Sp 

Su = C ■ Sp 


Unlike the s-index, the Sr-index is not meant to be used for mea¬ 
suring overall impact. Rather, it is an adaptation which accounts 
for temporality. In many cases, it is more relevant to examine re¬ 
cent performance and impact information rather than the aggregate, 
including recurring performance evaluation, ranking for modem 
relevance and comparison purposes. Note that one could also ad¬ 
ditionally filter Sr-index results to only include papers published 
within rather than also receiving citations within the recent r years, 
to compare impact of only new papers if desired. The former is a 
subset of the latter and can easily be computed post-hoc. 

s-index and Sr-index values can be quite large, especially for 
very influential papers, authors and venues. For human parsing, 
we can scale the result into a comprehensible range by using 5 = 
log2(5) and Sr ~ log2(5r) in practice. The logarithm function 
is monotonically increasing and will thus preserve ranking over the 
transformation. It further offers the attractive interpretable property 
of “doubled impact” for each additional point. 

4.3 Algorithm 

The last property which s-index must satisfy is to be efficiently 
computable. It is clear that the most expensive component of the 
proposed computation is calculating the total number of walks of 
varying length from each paper. While computing these values 
seems computationally daunting, it is not so in practice with care¬ 
ful design. In fact, there exist much better solutions than the naive 
approach of counting walks on a per node basis via local graph 
search, which quickly becomes exponentially costly depending on 
path length and connectedness of G. 

One promising approach involves computing the number of walks 
in graph G of varying length i by taking powers of the adjacency 
matrix A of G. It is well known that cell (pi,P 2 ) of A* gives the 
total number of walks of length i from p\ to p 2 . We can next com¬ 
pute the row-sum for each p\ to get the number of walks of length 
i from pi, and weight the result according to I{i) = d*. More¬ 
over, we can iteratively compute A* by keeping only and A 
in memory. However, while sparse matrix multiplication is rela¬ 
tively efficient even for large matrices, memory constraints quickly 
become prohibitive given the increase in density of nonzeros for 
each additional exponentiation. In our experiments on a machine 
with 400GB RAM, A^ cost roughly 110GB RAM to store. The 
computation for A'* resulted in an out-of-memory error - we ex¬ 
pect the memory cost would over 1TB. 

Fortunately, a more clever solution exists: instead of computing 
A* and calculating the row-sum for each pi, we can directly com¬ 
pute the total number of length i walks by A*v where v = 
is the column 1 -vector. Thus, we avoid direct computation of A* 


by instead computing A(A(... Av)) iteratively, in a manner simi¬ 
lar to power iteration (though nonstochastic and unnormalized). In 
each iteration, we compute v = Av and thus maintain sparsity of 
A. We can then use a separate 0-vector s to accumulate s-index 
scores in a single pass. The time complexity for each sparse matrix 
dense vector multiplication will be 0{\Epp\\P\), so with m iter¬ 
ations we get 0{m\Epp\\P\) time-complexity which is linear on 
the number of papers (nodes), citations (edges) and walk length m, 
with only additional O (| Spa 11A |) and O (| Sp„ 11U |) for authors and 
venues respectively. Furthermore, since we need to store only A, 
V and s for each iteration, we can compute paper s-index using a 
fixed space complexity of 0(|Spp| -|- 2|P|), with only additional 
0(|Spa| + I A|) and 0(1 Spa I -I- |U|) for author and venue s-index 
respectively. Algorithm[T] gives the concise algorithm. To compute 
Sr-index, we simply use the adjacency matrix A^ associated with 
the induced subgraph Gr, containing only edges from papers pub¬ 
lished in the last r years - complexity analysis is trivially similar. 

5. EXPERIMENTS 

In this section, we include qualitative and quantitative results 
from applying s-index and Sr-index on the Microsoft Academic 
Search (MAS) citation graph. The graph consists of over 119 mil¬ 
lion papers, 1 billion citation edges, 103 million authors and 21 
thousand venues - for a more detailed description, we refer the 
reader to |29| . We begin by first exploring some properties of s- 
index in practice. Next, we evaluate ranking correlation with tradi¬ 
tionally used metrics and report s-index and Sr-index ranking re¬ 
sults on the Microsoft Academic Search dataset. Lastly, we discuss 
parameter selection and give results substantiating the scalability 
of our approach. 

5.1 Properties 

5.1.1 Distribution 

Figures |3b| and show the distributions of s-index scores 
across papers, authors and venues found in the MAS graph re¬ 
spectively. The distributions are heavy-tailed, according to ex¬ 
pectation, and suggest lognormal behavior - few papers, authors 
and venues are extremely impactful, whereas the majority are less 
prolific. Though the original citation count distribution is much 
closer to a power-law, the s-index distribution becomes increas¬ 
ingly curved with greater walk-length m, as more and more low- 
cited papers are pushed to higher ranks due to indirect impact being 
accounted for. 

5.1.2 Growth 

Although the growth over time of a paper, author or venue’s s- 
index depends entirely on how it impacts the scientific community, 
one might expect that the score for a popular paper would increase 
exponentially given the “fan-out” of the DAG rooted at a paper p, 
induced from G by nodes in IJi=i m associated 

edges.We find that for moderately and highly popular papers, ex¬ 
ponential growth is indeed enjoyed for a time - in fact, the full 
s-index over time curves generally exhibit clear sigmoidal growth 
characterized by a period of dormancy, rapid direct and indirect ci¬ 
tation and eventual taper. The temporal length and rapidity of such 
growth are of course determined by innate popularity and contem¬ 
porary relevance of the paper. 

Conversely, in cases where papers receive very few or no cita¬ 
tions which are themselves poorly cited, the growth is better char¬ 
acterized as a step function in which changes to the s-index happen 
sporadically over the years. This is characteristic of the famous 
“diffusion of innovations” theory proposed in |26| which describes 







Figure 3: Distribution of s-index scores for (a) papers (b) authors and (c) venues. 


Table 2: Top s-index (top) and S 5 -index (bottom) rankings on MAS data - papers and authors selected across data mining/database/machine 
learning areas and venues across all areas. 


Papers 

Authors 

Venues 

Classification and Regression Trees 

Robert E. Schapire 

Cancer 

Basic Local Alignment Search Tool 

Jiawei Han 

New England Journal of Med. 

Occam’s Razor 

Weiyin Loh 

Proc. of the Natl. Acad, of Sci. 

Pattern Recognition and Mach. Learning 

Michael Kearns 

The Lancet 

The Strength of Weak Learnability 

Stephen F. Altschul 

Nature 

C4.5: Programs for Mach. Learning 

Webb C. Miller 

Science 

The Nature of Statistical Learning Theory 

Warren Gish 

Arthritis and Rheumatism 

Neural Network Ensembles 

David Lipman 

Journal of the Amer. Med. Assoc. 

The Protein Data Bank 

Michael Stonebraker 

Circulation 

Advances in Knowledge Disc, and Data Mining 

Usama Fayyad 

Journal of Bio. Chem. 

Genetic Alg. in Search Opt. and Mach. Learning 

Christopher J. Merz 

British Med. Journal 

The Comp. Complexity of Mach. Learning 

Rakesh Agrawal 

Cell 

Efficient Distrib.-free Learning of Prob. Concepts 

Manfred K. Warmuth 

Gastroenterology 

The Des. and Anal, of Efficient Learning Algorithms 

Ramez Elmasri 

Blood 

Separating Distrib.-free and Mist.-bound Learning Models 

Christopher J. Date 

Annals of Internal Med. 

Reliable Scheduling in a TMR Database System 

Sally A. Goldman 

Pediatrics 

Learning Binary Relations and Total Orders 

Shamkant B. Navathe 

Neurology 

Mach. Learning: a Theoretical Approach 

Padhraic Smyth 

The Journal of Pediatrics 

On-line Learning of Linear Functions 

Catherine Blake 

Annals of Neurology 

Learning Decision Trees Using the Fourier Spectrum 

John R. Quinlan 

The Amer. Journal of Med. 

Basic Local Alignment Search Tool 

Jiawei Han 

Cancer 

Pattern Recognition in Mach. Learning 

Usama Fayyad 

New England Journal of Med. 

The Protein Data Bank 

Webb C. Miller 

Proc. of the Natl. Acad, of Sci. 

Classification and Regression Trees 

Ramez Elmasri 

The Lancet 

Syst. and Int. Anal, of Gene Lists using DAVID 

Stephen F. Altschul 

Nature 

The Nature of Statistical Learning Theory 

Warren Gish 

Science 

Social Network Analysis: Methods and Applications 

Shamkant B. Navathe 

Journal of the Amer. Med. Assoc. 

Genetic Alg. in Search Opt. and Mach. Learning 

David J. Lipman 

Circulation 

Advances in Knowledge Disc, and Data Mining 

Eugene W. Myers 

Journal of Bio. Chem 

Assoc. Rules and Data Mining in Hosp. Inf. Control 

Padhraic Smyth 

Applied Physics Letters 

C4.5: Programs for Mach. Learning 

Rakesh Agrawal 

British Med. Journal 

Gene Expr. Omnibus: NCBI Gene Expr. Data Repo. 

Christopher J. Merz 

Arthritis and Rheumatism 

The Elements of Statistical Learning 

Gregory Piatetsky-Shapiro 

Cell 

Data Mining: Concepts and Techniques 

Christopher M. Bishop 

Journal of Applied Physics 

Data Preparation for Data Mining 

Philip S. Yu 

Blood 

NCBI GEO: Arch, for Genomic Data 

Nasser M. Nasrabadi 

Annals of Internal Med. 

Covering Numbers for Support Vector Mach. 

Christopher W. Clifton 

Amer. Journal of Resp. Med. 

Maint. of Disc. Assoc. Rules in Large Databases 

Catherine Blake 

The Amer. Journal of Med. 

Parallel Mining of Assoc. Rules 

Christopher J. Date 

Pediatrics 

CDD: A Cons. Domain Database for Inter. Anal. 

John R. Quinlan 

The Journal of Pediatrics 


the process by which an innovation is communicated to participants 
in a social system over time. The same sigmoidal diffusion pattern 
cannot he well observed for citation count, presumably because it 
only accounts for direct impact through citation. 

5.2 Ranking Performance 

5.2.1 Similarity to Existing Metrics 

Figures and show the relationship between s-index 


and commonly used state-of-the-art metrics. Correspondence with 
PageRank is not shown given the invalidity of the results for most 
papers given machine-precision issues (the overwhelming majority 
of papers have 0 impact and no meaningful ranking). 

It is evident (and expected) that in all cases there are positive 
correlations between the respective s-index and the metric scores. 
Given that the Pearson correlation coefficient is ill-suited for tasks 
involving non-linear relationships, we use Spearman p rank corre- 













Paper citation count Author h-index Journal Impact Factor 

(a) (b) (c) 

Figure 4: Correspondence between s-index scores and (a) paper citation count, (b) author /i-index and (c) venue JIF. Colors denote density 
of points in logarithmically discretized bins in accordance with colorbars (red - high, blue - low). 


lation coefficient defined as 


p = 1 - 


6E4 

n(n2 — 1) 


where di = Xi — yi is the difference between ranks and n is the 
number of samples, in order to measure the strength of the relation¬ 
ships. We find that p = 0.78 between s-index and citation count, 
and p = 0.49 and p = 0.76 between s-index and ft-index and JIF 
respectively. Perfectly correlated or inversely correlated ranking is 
characteristic of p = 1 and p = — 1 respectively. notes that 
p > 0.5 is considered to be a “large” positive correlation. 

Interestingly, despite the generally strong numerical correlations 
substantiated by the results in Figure]^ it is apparent that there are 
many cases of poorly cited papers, low fi-index authors and low 
JIF venues with s-index scores characteristically higher than the 
norm and vice versa. Further substantiating the value of measuring 
indirect impact via s-index versus traditional “direct” metrics, we 
find that 57 of the 62 past Turing award winners can be found in the 
top 0.5% of all authors ranked by s-index as opposed to 50 when 
ranked by /i-index - a recall improvement of 11%. 


5.2.2 Findings in Practice 

Table 1^ shows the top 20 papers, authors and venues ranked us¬ 
ing s-index and s^-index (r = 5) on the MAS graph. For paper and 
author ranking, we use an induced subgraph of papers and authors 
which have “field of study” labels corresponding to web mining, 
data mining, social network analysis, databases and machine learn¬ 
ing. For venue ranking, we use the entire graph containing all avail¬ 
able data for papers. We rank in this distinctive fashion to keep the 
discussion relevant to the reader, keeping in line with likelihood 
of expert familiarity. We eliminate entries spuriously categorized 
into these fields as a result of the data collection process from the 
ranking for the same. Several of the top papers and authors in these 
rankings are well-known in the bioinformatics field, and appear be¬ 
cause of association with the data mining “field of study.” 

Interestingly, several earlier foundational works ranked using s- 
index disappear from the Sr-index list, in favor of more up-and- 
coming and modem topics including social network analysis, ap¬ 
plied data mining and bioinformatics. Several of the authors also 
shift accordingly. However, almost all of the venues remain the 
same, likely due to increased attraction due to tradition and estab¬ 
lished reputation. 

5.3 Parameter Selection 

s-index is characteristic of two main parameters: the decay fac¬ 
tor d and the walk length m. We select these parameters in a prin¬ 
cipled fashion, which we describe here. 

The decay factor d is used to weight the influence of walks which 
are i steps away. It is similar to the damping factor used in PageR- 


ank, which describes the “leakage probability” of web surfers upon 
page visits. Although PageRank uses a damping factor based on 
the observation that surfers^pically follow on the order of 6 hy¬ 
perlinks (d = I ~ 0.15), Ifil notes d = 0.5 is a better choice on 
citation graphs based on th^requency of feed-forward loops (see 
Figure]^ in real data. Thus, we choose d = 0.5 to denote that 50% 
of the influence of a paper on descendants is lost over each step. 

m denotes the maximum walk-length over which influence is 
computed. We choose a small m = 4 in practice for multiple 
reasons: (a) the exponential influence decay will already heavily 
discount walks to “far away” papers - m = 4 already produces a 
weight of only ^, and (b) we expect that the content of far-away 
papers will lose relevance to the starting paper. Moreover, we ob¬ 
serve that the successive Spearman rank correlation p rapidly ap¬ 
proaches 1 after just a few steps - between m = 3 and m = 4, p is 
already > 0.999 with exponentially diminishing returns. 

5.4 Scalability 

As described in Section [43] s-index computation for papers is 
characterized by 0{m\Epp\\P\) time-complexity, which is linear 
on the number of papers (nodes), citations (edges) and walk length 
m. Computing s-index for authors and venues incurs small ex¬ 
penses of 0(|i?pa||A|) and 0(|i7p„||17|) operations respectively. 
Figures [5^ andshow linear scaling with respect to walk length 
m and number of edges |i7pp| for computing paper s-index on the 
MAS graph using a MATLAB implementation. We have addition¬ 
ally developed a Microsoft COSMOS (more generally, MS-SQL) 
implementation which runs on the MAS graph in minutes and is 
currently deployed and used regularly at Microsoft. 


6. CONCLUSION 

In this work, we aim to improve upon the state-of-the-art in im¬ 
pact metrics used for quantifying scientific research productivity. 
While quantitative evaluation is by no means a functional replace¬ 
ment for carefully reading papers or qualitatively examining au¬ 
thor contributions and evaluating peer-reviewed reputation, impact 
metrics are commonly used in managerial and strategic research 
decisions involving assigning tenure, awarding prizes, appointing 
academic posts, comparing researchers and deciding submission 
venues. It is therefore important that these metrics be principled 
and behave according to human intuition. In this work, we identify 
several desiderata that impact metrics should obey in practice and 
analyze how currently used state-of-the-art metrics violate these 
properties. To this end, we next build towards the s-index met¬ 
ric which quantifies impact of papers, authors and venues based on 
influence propagated over a citation graph and propose a fast, scal¬ 
able algorithm for its computation which is currently deployed and 
used at Microsoft. We evaluate s-index on a large citation graph 
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from Microsoft Academic Search and show promising results. 
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